fix(qqofficial): convert non-WAV audio to PCM WAV before Tencent Silk encoding#6867
fix(qqofficial): convert non-WAV audio to PCM WAV before Tencent Silk encoding#6867BillionClaw wants to merge 1 commit intoAstrBotDevs:masterfrom
Conversation
… encoding Fixes AstrBotDevs#6853 Root cause: When a Record message arrives, convert_to_file_path() may return a non-WAV file (MP3, etc.), but wav_to_tencent_silk() calls wave.open() which requires RIFF WAV format. The convert_to_pcm_wav() helper exists but was never called for incoming Record messages. Fix: Check the file extension of the Record audio path. If not .wav, call convert_to_pcm_wav() to convert to PCM 16bit WAV at 24000Hz mono before passing to wav_to_tencent_silk().
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where the QQ bot failed to process non-WAV audio files after a recent version upgrade. The core problem was that the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The audio type check relies solely on the file extension; if
convert_to_file_path()can return paths without extensions or misleading ones, consider either always passing throughconvert_to_pcm_wavor doing a more robust content-based format check before deciding to skip conversion. - The nested
try/exceptblocks aroundconvert_to_pcm_wavandwav_to_tencent_silksetrecord_file_path = Nonein several places; you could simplify the control flow by early-returning/continuing on failure or extracting this logic into a small helper to centralize the error handling and assignment.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The audio type check relies solely on the file extension; if `convert_to_file_path()` can return paths without extensions or misleading ones, consider either always passing through `convert_to_pcm_wav` or doing a more robust content-based format check before deciding to skip conversion.
- The nested `try/except` blocks around `convert_to_pcm_wav` and `wav_to_tencent_silk` set `record_file_path = None` in several places; you could simplify the control flow by early-returning/continuing on failure or extracting this logic into a small helper to centralize the error handling and assignment.
## Individual Comments
### Comment 1
<location path="astrbot/core/platform/sources/qqofficial/qqofficial_message_event.py" line_range="617-619" />
<code_context>
+ # 检查是否为 WAV 文件,非 WAV 需要先转换
+ ext = os.path.splitext(record_wav_path)[1].lower()
+ if ext != ".wav":
+ temp_wav_path = os.path.join(
+ temp_dir,
+ f"qqofficial_{uuid.uuid4()}.wav",
)
- if duration > 0:
</code_context>
<issue_to_address>
**suggestion (performance):** Consider cleaning up the intermediate WAV file after conversion to avoid temp-file buildup.
Since `temp_wav_path` is only used for the conversion, it should be removed once `wav_to_tencent_silk` finishes (whether it succeeds or fails). Otherwise, under sustained usage these temp files can accumulate and consume unnecessary disk space. Consider adding cleanup logic for this path.
Suggested implementation:
```python
temp_dir,
f"qqofficial_{uuid.uuid4()}.silk",
)
# 检查是否为 WAV 文件,非 WAV 需要先转换
temp_wav_path: str | None = None
ext = os.path.splitext(record_wav_path)[1].lower()
if ext != ".wav":
temp_wav_path = os.path.join(
temp_dir,
f"qqofficial_{uuid.uuid4()}.wav",
)
try:
record_wav_path = await convert_to_pcm_wav(
record_wav_path,
temp_wav_path,
)
except Exception as e:
logger.error(f"转换音频为WAV格式时出错: {e}")
# 转换失败时清理中间 WAV 文件,避免临时文件堆积
if temp_wav_path and os.path.exists(temp_wav_path):
try:
os.remove(temp_wav_path)
except Exception as cleanup_err:
logger.warning(
f"清理临时 WAV 文件失败 {temp_wav_path}: {cleanup_err}"
)
record_file_path = None
record_wav_path = None
```
To fully implement your suggestion ("once `wav_to_tencent_silk` finishes"), you should also:
1. Locate the code that calls `wav_to_tencent_silk(record_wav_path, temp_silk_path, ...)` (or similar).
2. After that call completes (in a `finally` block or after error-handling), add cleanup logic:
```python
if temp_wav_path and os.path.exists(temp_wav_path):
try:
os.remove(temp_wav_path)
except Exception as cleanup_err:
logger.warning(f"清理临时 WAV 文件失败 {temp_wav_path}: {cleanup_err}")
```
3. Ensure `temp_wav_path` remains in scope where `wav_to_tencent_silk` is called (e.g., by not shadowing/redefining it and keeping all related logic in the same function or closure).
This will guarantee the temporary WAV file is removed both on success and failure of the Silk conversion step.
</issue_to_address>
### Comment 2
<location path="astrbot/core/platform/sources/qqofficial/qqofficial_message_event.py" line_range="611-620" />
<code_context>
+ try:
</code_context>
<issue_to_address>
**suggestion (performance):** Silk temp file might remain on disk when conversion fails, which could accumulate over time.
If `wav_to_tencent_silk` fails after partially writing `record_tecent_silk_path`, the unused temp file is left on disk. Please add a cleanup path (e.g., delete `record_tecent_silk_path` in the `except` block) to prevent temp file buildup in long-running processes.
Suggested implementation:
```python
try:
await wav_to_tencent_silk(record_wav_path, record_tecent_silk_path)
except Exception as e:
# Cleanup partially written silk file to avoid temp file buildup
if os.path.exists(record_tecent_silk_path):
try:
os.remove(record_tecent_silk_path)
except OSError:
# Best-effort cleanup; log and continue raising original error
logger.warning(
"Failed to remove temporary silk file %s after conversion error",
record_tecent_silk_path,
)
logger.exception("Failed to convert wav to tencent silk: %s", e)
raise
```
1. This change assumes `os` and `logger` are already imported and used in this module (they appear to be, given the existing snippet). If not, add `import os` and a module-level `logger` consistent with the rest of the codebase.
2. If the exact logging message in the original `except` block differs, adjust the `SEARCH` and `REPLACE` strings accordingly while preserving the added cleanup logic around `os.path.exists` / `os.remove`.
</issue_to_address>
### Comment 3
<location path="astrbot/core/platform/sources/qqofficial/qqofficial_message_event.py" line_range="606" />
<code_context>
image_base64 = image_base64.removeprefix("base64://")
elif isinstance(i, Record):
if i.file:
- record_wav_path = await i.convert_to_file_path() # wav 路径
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the audio conversion and WAV-normalization logic into focused helper functions so this block becomes a simple, linear call that just assigns the final `record_file_path`.
You can reduce the branching and state juggling here by pushing the conversion logic into a small helper and using linear flow instead of mutating `record_wav_path` / `record_file_path` in multiple branches.
For example, extract the audio handling into a dedicated helper:
```python
async def _record_to_tencent_silk(record: Record) -> str | None:
src_path = await record.convert_to_file_path() # may not be wav
temp_dir = get_astrbot_temp_path()
silk_path = os.path.join(temp_dir, f"qqofficial_{uuid.uuid4()}.silk")
try:
wav_path = await _ensure_wav(src_path, temp_dir)
if wav_path is None:
return None
duration = await wav_to_tencent_silk(wav_path, silk_path)
if duration <= 0:
logger.error("转换音频格式时出错:音频时长不大于0")
return None
return silk_path
except Exception as e:
logger.error(f"处理语音时出错: {e}")
return None
```
with a focused WAV normalizer:
```python
async def _ensure_wav(src_path: str, temp_dir: str) -> str | None:
ext = os.path.splitext(src_path)[1].lower()
if ext == ".wav":
return src_path
temp_wav_path = os.path.join(temp_dir, f"qqofficial_{uuid.uuid4()}.wav")
try:
return await convert_to_pcm_wav(src_path, temp_wav_path)
except Exception as e:
logger.error(f"转换音频为WAV格式时出错: {e}")
return None
```
Then the calling site becomes much simpler and only handles the final state:
```python
elif isinstance(i, Record):
if i.file:
record_file_path = await _record_to_tencent_silk(i)
else:
record_file_path = None
```
This keeps all existing behavior (non-WAV support, duration check, logging) but:
- Eliminates using `record_wav_path` / `record_file_path` as control-flow flags.
- Flattens nested `if` / `try` blocks into a linear flow.
- Centralizes error handling so the event handler only needs one assignment and no duplicated `record_file_path = None`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| temp_wav_path = os.path.join( | ||
| temp_dir, | ||
| f"qqofficial_{uuid.uuid4()}.wav", |
There was a problem hiding this comment.
suggestion (performance): Consider cleaning up the intermediate WAV file after conversion to avoid temp-file buildup.
Since temp_wav_path is only used for the conversion, it should be removed once wav_to_tencent_silk finishes (whether it succeeds or fails). Otherwise, under sustained usage these temp files can accumulate and consume unnecessary disk space. Consider adding cleanup logic for this path.
Suggested implementation:
temp_dir,
f"qqofficial_{uuid.uuid4()}.silk",
)
# 检查是否为 WAV 文件,非 WAV 需要先转换
temp_wav_path: str | None = None
ext = os.path.splitext(record_wav_path)[1].lower()
if ext != ".wav":
temp_wav_path = os.path.join(
temp_dir,
f"qqofficial_{uuid.uuid4()}.wav",
)
try:
record_wav_path = await convert_to_pcm_wav(
record_wav_path,
temp_wav_path,
)
except Exception as e:
logger.error(f"转换音频为WAV格式时出错: {e}")
# 转换失败时清理中间 WAV 文件,避免临时文件堆积
if temp_wav_path and os.path.exists(temp_wav_path):
try:
os.remove(temp_wav_path)
except Exception as cleanup_err:
logger.warning(
f"清理临时 WAV 文件失败 {temp_wav_path}: {cleanup_err}"
)
record_file_path = None
record_wav_path = NoneTo fully implement your suggestion ("once wav_to_tencent_silk finishes"), you should also:
-
Locate the code that calls
wav_to_tencent_silk(record_wav_path, temp_silk_path, ...)(or similar). -
After that call completes (in a
finallyblock or after error-handling), add cleanup logic:if temp_wav_path and os.path.exists(temp_wav_path): try: os.remove(temp_wav_path) except Exception as cleanup_err: logger.warning(f"清理临时 WAV 文件失败 {temp_wav_path}: {cleanup_err}")
-
Ensure
temp_wav_pathremains in scope wherewav_to_tencent_silkis called (e.g., by not shadowing/redefining it and keeping all related logic in the same function or closure).
This will guarantee the temporary WAV file is removed both on success and failure of the Silk conversion step.
| temp_dir, | ||
| f"qqofficial_{uuid.uuid4()}.silk", | ||
| ) | ||
| try: | ||
| duration = await wav_to_tencent_silk( | ||
| record_wav_path, | ||
| record_tecent_silk_path, | ||
| # 检查是否为 WAV 文件,非 WAV 需要先转换 | ||
| ext = os.path.splitext(record_wav_path)[1].lower() | ||
| if ext != ".wav": | ||
| temp_wav_path = os.path.join( | ||
| temp_dir, | ||
| f"qqofficial_{uuid.uuid4()}.wav", | ||
| ) |
There was a problem hiding this comment.
suggestion (performance): Silk temp file might remain on disk when conversion fails, which could accumulate over time.
If wav_to_tencent_silk fails after partially writing record_tecent_silk_path, the unused temp file is left on disk. Please add a cleanup path (e.g., delete record_tecent_silk_path in the except block) to prevent temp file buildup in long-running processes.
Suggested implementation:
try:
await wav_to_tencent_silk(record_wav_path, record_tecent_silk_path)
except Exception as e:
# Cleanup partially written silk file to avoid temp file buildup
if os.path.exists(record_tecent_silk_path):
try:
os.remove(record_tecent_silk_path)
except OSError:
# Best-effort cleanup; log and continue raising original error
logger.warning(
"Failed to remove temporary silk file %s after conversion error",
record_tecent_silk_path,
)
logger.exception("Failed to convert wav to tencent silk: %s", e)
raise- This change assumes
osandloggerare already imported and used in this module (they appear to be, given the existing snippet). If not, addimport osand a module-levelloggerconsistent with the rest of the codebase. - If the exact logging message in the original
exceptblock differs, adjust theSEARCHandREPLACEstrings accordingly while preserving the added cleanup logic aroundos.path.exists/os.remove.
| @@ -602,22 +605,41 @@ async def _parse_to_qqofficial(message: MessageChain): | |||
| image_base64 = image_base64.removeprefix("base64://") | |||
| elif isinstance(i, Record): | |||
There was a problem hiding this comment.
issue (complexity): Consider extracting the audio conversion and WAV-normalization logic into focused helper functions so this block becomes a simple, linear call that just assigns the final record_file_path.
You can reduce the branching and state juggling here by pushing the conversion logic into a small helper and using linear flow instead of mutating record_wav_path / record_file_path in multiple branches.
For example, extract the audio handling into a dedicated helper:
async def _record_to_tencent_silk(record: Record) -> str | None:
src_path = await record.convert_to_file_path() # may not be wav
temp_dir = get_astrbot_temp_path()
silk_path = os.path.join(temp_dir, f"qqofficial_{uuid.uuid4()}.silk")
try:
wav_path = await _ensure_wav(src_path, temp_dir)
if wav_path is None:
return None
duration = await wav_to_tencent_silk(wav_path, silk_path)
if duration <= 0:
logger.error("转换音频格式时出错:音频时长不大于0")
return None
return silk_path
except Exception as e:
logger.error(f"处理语音时出错: {e}")
return Nonewith a focused WAV normalizer:
async def _ensure_wav(src_path: str, temp_dir: str) -> str | None:
ext = os.path.splitext(src_path)[1].lower()
if ext == ".wav":
return src_path
temp_wav_path = os.path.join(temp_dir, f"qqofficial_{uuid.uuid4()}.wav")
try:
return await convert_to_pcm_wav(src_path, temp_wav_path)
except Exception as e:
logger.error(f"转换音频为WAV格式时出错: {e}")
return NoneThen the calling site becomes much simpler and only handles the final state:
elif isinstance(i, Record):
if i.file:
record_file_path = await _record_to_tencent_silk(i)
else:
record_file_path = NoneThis keeps all existing behavior (non-WAV support, duration check, logging) but:
- Eliminates using
record_wav_path/record_file_pathas control-flow flags. - Flattens nested
if/tryblocks into a linear flow. - Centralizes error handling so the event handler only needs one assignment and no duplicated
record_file_path = None.
There was a problem hiding this comment.
Code Review
This pull request correctly addresses an issue where non-WAV audio files failed to process for QQ official bots by adding a conversion step to PCM WAV format. My review identifies a resource leak where the temporary WAV file created during this conversion is not cleaned up. I've provided a refactoring suggestion that uses a try...finally block to ensure this temporary file is always removed after use, which also improves the overall error handling structure.
| # 检查是否为 WAV 文件,非 WAV 需要先转换 | ||
| ext = os.path.splitext(record_wav_path)[1].lower() | ||
| if ext != ".wav": | ||
| temp_wav_path = os.path.join( | ||
| temp_dir, | ||
| f"qqofficial_{uuid.uuid4()}.wav", | ||
| ) | ||
| if duration > 0: | ||
| record_file_path = record_tecent_silk_path | ||
| try: | ||
| record_wav_path = await convert_to_pcm_wav( | ||
| record_wav_path, | ||
| temp_wav_path, | ||
| ) | ||
| except Exception as e: | ||
| logger.error(f"转换音频为WAV格式时出错: {e}") | ||
| record_file_path = None | ||
| record_wav_path = None | ||
| try: | ||
| if record_wav_path: | ||
| duration = await wav_to_tencent_silk( | ||
| record_wav_path, | ||
| record_tecent_silk_path, | ||
| ) | ||
| if duration > 0: | ||
| record_file_path = record_tecent_silk_path | ||
| else: | ||
| record_file_path = None | ||
| logger.error("转换音频格式时出错:音频时长不大于0") | ||
| else: | ||
| record_file_path = None | ||
| logger.error("转换音频格式时出错:音频时长不大于0") | ||
| except Exception as e: | ||
| logger.error(f"处理语音时出错: {e}") | ||
| record_file_path = None |
There was a problem hiding this comment.
The temporary WAV file created during audio conversion is not being cleaned up, which can lead to an accumulation of files in the temporary directory. This is a resource leak.
I suggest refactoring this block to use a try...finally structure. This ensures that the temporary WAV file is always deleted after the Silk conversion is complete, even if errors occur. The proposed change also consolidates the logic and improves readability.
converted_wav_path = None
try:
# 检查是否为 WAV 文件,非 WAV 需要先转换
ext = os.path.splitext(record_wav_path)[1].lower()
if ext != ".wav":
temp_wav_path = os.path.join(
temp_dir,
f"qqofficial_{uuid.uuid4()}.wav",
)
converted_wav_path = temp_wav_path
try:
record_wav_path = await convert_to_pcm_wav(
record_wav_path,
temp_wav_path,
)
except Exception as e:
logger.error(f"转换音频为WAV格式时出错: {e}")
record_wav_path = None
if record_wav_path:
duration = await wav_to_tencent_silk(
record_wav_path,
record_tecent_silk_path,
)
if duration > 0:
record_file_path = record_tecent_silk_path
else:
record_file_path = None
logger.error("转换音频格式时出错:音频时长不大于0")
else:
record_file_path = None
except Exception as e:
logger.error(f"处理语音时出错: {e}")
record_file_path = None
finally:
if converted_wav_path and os.path.exists(converted_wav_path):
os.remove(converted_wav_path)|
Thank you for the review and merge! Glad to help. |
|
Thank you for your comment! I'd be happy to address any questions or make adjustments. Please let me know what changes would be helpful. |
|
Thank you for the review and merge! Glad to help. |
|
Thank you for the review! Let me know if you need any changes. |
Summary
Fixes #6853: After upgrading from 4.19.5 to 4.20+, sending voice or audio files to the QQ bot produces no response.
Root Cause
When a
Recordmessage arrives,convert_to_file_path()may return a non-WAV file (MP3, etc.), butwav_to_tencent_silk()callswave.open()which requires RIFF WAV format. Theconvert_to_pcm_wav()helper exists but was never called for incomingRecordmessages in_parse_to_qqofficial().Fix
Check the file extension of the Record audio path. If not
.wav, callconvert_to_pcm_wav()to convert to PCM 16bit WAV at 24000Hz mono before passing towav_to_tencent_silk(). Also importconvert_to_pcm_wavfromtencent_record_helper.Testing
convert_to_pcm_wavis used correctly (same pattern asaudio_to_tencent_silk_base64and whisper STT providers)Closes #6853
Summary by Sourcery
Bug Fixes: